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The SARS epidemic has boosted interest in research on 
coronavirus biodiversity and genomics. Before 2003, there were 
only 10 coronaviruses with complete genomes available. After 
the SARS epidemic, up to December 2008, there was an addition 
of 16 coronaviruses with complete genomes sequenced. These 
Include two human coronaviruses (human coronavirus NL63 
and human coronavirus HKU1), 10 other mammalian coronavi¬ 
ruses [bat SARS coronavirus, bat coronavirus (bat-CoV) HKU2, 
bat-CoV HKU4, bat-CoV HKU5, bat-CoV HKU8, bat-CoV HKU9, 
bat-CoV 512/2005, bat-CoV 1A, equine coronavirus, and beluga 
whale coronavirus] and four avian coronaviruses (turkey 
coronavirus, bulbul coronavirus HKU11, thrush coronavirus 
HKU12, and munia coronavirus HKU13). Two novel subgroups 
in group 2 coronavirus (groups 2c and 2d) and two novel 
subgroups in group 3 coronavirus (groups 3b and 3c) have been 
proposed. The diversity of coronaviruses is a result of the 
infidelity of RNA-dependent RNA polymerase, high frequency of 
homologous RNA recombination, and the large genomes of 
coronaviruses. Among all hosts, the diversity of coronaviruses 
is most evidenced in bats and birds, which may be a result of 
their species diversity, ability to fly, environmental pressures, 
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and habits of roosting and flocking. The present evidence 
supports that bat coronaviruses are the gene pools of group 1 
and 2 coronaviruses, whereas bird coronaviruses are the gene 
pools of group 3 coronaviruses. With the increasing number of 
coronaviruses, more and more closely related coronaviruses 
from distantly related animals have been observed, which were 
results of recent interspecies jumping and may be the cause of 
disastrous outbreaks of zoonotic diseases. Exp Biol Med 
234:1117-1127, 2009 

Key words: coronavirus; genome; diversity; phylogeny; interspecies 
jumping 


Introduction 

Among the 7800 “coronavirus” papers found by 
MEDLINE search, almost half of them were published 
after the SARS epidemic, which has boosted interest in all 
directions of coronavirus research, most notably, coronavi¬ 
rus biodiversity and genomics (1). Infectious bronchitis 
virus (IBV), the first coronavirus discovered, was isolated 
from chicken embryos in 1937 (2). This was followed by 
mouse hepatitis virus (MHV) and other mammalian 
coronaviruses in the 1940s (3, 4). The two human 
coronaviruses, human coronavirus 229E (HCoV-229E) 
and human coronavirus OC43 (HCoV-OC43), were dis¬ 
covered in the 1960s (5, 6). Before 2003, there were only 10 
coronaviruses with complete genomes available, with two 
human coronaviruses (HCoV-229E and HCoV-OC43), 
seven other mammalian coronaviruses [MHV, bovine 
coronavirus (BCoV), porcine hemagglutinating encephalo¬ 
myelitis virus (PHEV), transmissible gastroenteritis virus 
(TGEV), porcine epidemic diarrhea virus (PEDV), porcine 
respiratory coronavirus (PRCV), and feline coronavirus 
(FCoV)] and one avian coronavirus (IBV) (Table 1, Fig. 
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Table 1 . Genomic Features of Coronaviruses with Complete Genomes Available 






Genome features 



Coronaviruses 3 

Size 

(bases) 

G + C 
content 

TRS 

No. of nsp 
in ORFlab 

No. of PL pro 

ORF downstream 
to N 

Group la 

PEDV 

28033 

0.42 

CUAAAC 

16 

2 

1 

TGEV 

28586 

0.38 

CUAAAC 0 

16 

2 

1 

FCoV 

29355 

0.38 

CUAAAC 

16 

2 

2 

Group 1b 

HCoV-229E 

27317 

0.38 

CUAAAC 

16 

2 

— 

HCoV-NL63 

27553 

0.34 

CUAAAC 

16 

2 

— 

Bat-CoV 512/2005 

28203 

0.40 

CUAAAC 

16 

2 

1 

Bat-CoV HKU2 

27165 

0.39 

CUAAAC 

16 

2 

1 

Bat-CoV HKU8 

28773 

0.42 

CUAAAC 

16 

2 

1 

Bat-CoV 1A 

28326 

0.38 

CUAAAC 

16 

2 

— 

Group 2a 

HCoV-OC43 

30738 

0.37 

CUAAAC 0 

16 

2 

— 

BCoV 

31028 

0.37 

CUAAAC 0 

16 

2 

— 

PHEV 

30480 

0.37 

CUAAAC 0 

16 

2 

— 

ECoV 

30992 

0.37 

CUAAAC 0 

16 

2 

— 

MHV 

31357 

0.42 

CUAAAC 0 

16 

2 

— 

HCoV-HKUl 

29926 

0.32 

CUAAAC 0 

16 

2 

— 

Group 2b 

SARS-CoV 

29751 

0.41 

ACGAAC 

16 

1 

— 

Bat-SARS-CoV HKU3 

29728 

0.41 

ACGAAC 

16 

1 

— 

Group 2c 

Bat-CoV HKU4 

30286 

0.38 

ACGAAC 

16 

1 

— 

Bat-CoV HKU5 

30488 

0.43 

ACGAAC 

16 

1 

— 

Group 2d 

Bat-CoV HKU9 

29114 

0.41 

ACGAAC 

16 

1 

2 

Group 3a 

IBV 

27608 

0.38 

CUUAACAA 

15 

1 

— 

TCoV 

27657 

0.38 

CUUAACAA 

15 

1 

— 

Group 3b 

SW1 

31686 

0.39 

AAACA 

15 

1 

— 

Group 3c 

BuCoV HKU11 

26476 

0.39 

ACACCA 

15 

1 

3 

ThCoV HKU12 

26396 

0.38 

ACACCA 

15 

1 

3 

MuCoV HKU13 

26552 

0.43 

ACACCA 

15 

1 

3 


a HCoV-229E, human coronavirus 229E; PEDV, porcine epidemic diarrhea virus; TGEV, porcine transmissible gastroenteritis virus; HCoV- 
NL63, human coronavirus NL63; FCoV, feline coronavirus; bat-CoV 512/2005, bat coronavirus 512/2005; bat-CoV HKU2, bat coronavirus 
HKU2; bat-CoV 1A, bat coronavirus 1 A; bat-CoV HKU8, bat coronavirus HKU8; HCoV-HKUl, human coronavirus HKU1; HCoV-OC43, human 
coronavirus OC43; MHV, mouse hepatitis virus; BCoV, bovine coronavirus; PHEV, porcine hemagglutinating encephalomyelitis virus; ECoV, 
equine coronavirus; SARS-CoV, SARS coronavirus; bat-SARS-CoV HKU3; bat SARS coronavirus HKU3; bat-CoV HKU4, bat coronavirus 
HKU4; bat-CoV HKU5, bat coronavirus HKU5; bat-CoV HKU9, bat coronavirus HKU9; IBV, infectious bronchitis virus; TCoV, turkey 
coronavirus; SW1, beluga whale coronavirus; BuCoV HKU11, Bulbul coronavirus HKU11; ThCoV HKU12, Thrush coronavirus HKU12; MuCoV 
HKU13, Munia coronavirus HKU13. 

b Internal ribosomal entry site is employed for orf3b of TGEV and E of group 2a coronaviruses. 


la). These coronaviruses were classified into three groups, 
with groups 1 and 2 comprising the nine mammalian 
coronaviruses and group 3 the avian coronavirus (Fig. la) 
(7-9). 

After the SARS epidemic, up to December 2008, there 
was an addition of 16 coronaviruses with complete genomes 
sequenced. These include two globally distributed human 
coronaviruses, human coronavirus NL63 (HCoV-NL63) and 
human coronavirus HKU1 (HCoV-HKUl) (10-26); 10 other 
mammalian coronaviruses, bat SARS coronavirus (bat- 
SARS-CoV), bat coronavirus (bat-CoV) HKU2, bat-CoV 


HKU4, bat-CoV HKU5, bat-CoV HKU8, bat-CoV HKU9, 
bat-CoV 512/2005, bat-CoV 1A, equine coronavirus, and 
beluga whale coronavirus (SW1) (27-35); and four avian 
coronaviruses, turkey coronavirus (TCoV), bulbul corona¬ 
virus HKU11, (BuCoV HKU11), thrush coronavirus 
HKU12 (ThCoV HKU12), and munia coronavirus HKU13 
(MuCoV HKU13) (Table 1, Fig. lb) (36, 37). Moreover, two 
novel subgroups in group 2 coronavirus (groups 2c and 2d) 
and two novel subgroups in group 3 coronavirus (groups 3b 
and 3c) have been proposed (33, 37). Recently, the 
Coronavirus Study Group of the International Committee 
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a G2 b 



Bat-CoV-HKU2 



G3c 


Figure 1. Phylogenetic analysis of RNA-dependent RNA polymerases (Pol) of the 10 coronaviruses with complete genome sequences 
available before SARS (panel A), and that of all coronaviruses with complete genome sequences available by the end of 2008 (panel B). The 
trees were constructed by neighbor joining method using Kimura’s two-parameter correction and bootstrap values calculated from 1000 trees. 
948 and 958 amino acid positions in Pol were included in the two analyses, respectively. The scale bars indicate the estimated number of 
substitutions per 10 amino acids. HCoV-229E, human coronavirus 229E (NCJ302645); PEDV, porcine epidemic diarrhea virus (NC_003436); 
TGEV, porcine transmissible gastroenteritis virus (NC_002306); FCoV, feline coronavirus (AY994055); PRCV, porcine respiratory coronavirus 
(DQ811787); HCoV-NL63, human coronavirus NL63 (NC_005831); bat-CoV-HKU2 (EF203064), HKU4 (NCL009019), HKU5 (NC_009020), 
HKU8 (NC_010438), HKU9 (NC_009021), 1A (NC_010437), IB (NC_010436), 512/2005 (NC_009657); HCoV-HKUl, human coronavirus 
HKU1 (NCJD06577), HCoV-OC43, human coronavirus OC43 (NC_005147); MHV, mouse hepatitis virus (NCJ306852); BCoV, bovine 
coronavirus (NC_003045); PHEV, porcine hemagglutinating encephalomyelitis virus (NC„007732); ECoV, equine coronavirus (NC_010327); 
SARS-CoV, SARS coronavirus (NC_004718); bat-SARS-CoV-HKU3, bat-SARS coronavirus HKU3 (NC_009694); IBV, infectious bronchitis 
virus (NC_001451); TCoV, turkey coronavirus (NC_010800); SW1, beluga whale coronavirus (NC_010646); BuCoV-HKUl 1, Bulbul coronavirus 
HKU11 (NC_011548); ThCoV-HKU12, Thrush coronavirus HKU12 (NC_011549); MuCoV-HKU13, Munia coronavirus FIKU13 (NC_011550). A 
color version of this figure is available in the online journal. 


for Taxonomy of Viruses (ICTV) has proposed three genera, 
Alphacoronavirus, Betacoronavirus, and Gammacoronavi- 
rus, to replace the traditional groups 1, 2, and 3 corona¬ 
viruses (http://talk.ictvonline.org/cfs-filesystemfile.ashx/_ 

key/Community Server. Components. Post Attachments/ 
00.00.00.06.26/2008.085_2D00_122V.01.Coronaviridae. 
pdf). 

The diversity of coronaviruses is a result of three major 
reasons. First, the infidelity of RNA-dependent RNA 
polymerase of coronaviruses makes their mutation rates in 
the order of one per 1000 to 10000 nucleotides replicated, 
which makes them especially plastic (38, 39). Second, as a 
result of their unique random template switching during 
RNA replication, thought to be mediated by a “copy- 
choice” mechanism, coronaviruses have a high frequency of 
homologous RNA recombination (40, 41). Third, as 
coronaviruses possess the largest genomes (26.4-31.7 kb) 
among all known RNA viruses, it has given this family of 
virus extra plasticity in accommodating and modifying 
genes. These three factors have not only led to the 
generation of a diversity of strains and genotypes of one 
coronavirus species, but also to new species which are able 


to adapt to new hosts and ecological niches, sometimes 
causing major zoonotic outbreaks with disastrous conse¬ 
quences (42). As a result of the numerous coronaviruses 
discovered and genomes sequenced in the past few years, 
our understanding of the diversity, genomics, and phylog- 
eny of coronavirus has greatly improved. In this article, we 
review the recent work by us and others on coronavirus 
diversity and genomics, with an emphasis on phylogeny and 
interspecies jumping. 

Group 1 Coronaviruses (Alphacoronavirus) 

Among the three groups of coronaviruses, the phylogeny 
of group 1 coronaviruses is the least well understood. 
Although it has been proposed that group 1 coronaviruses 
can be subdivided into groups la and lb based on 
phylogenetic clustering of group la coronaviruses and 
>90% overall genome identity among the members of this 
subgroup (Fig. lb), no additional genomic evidence, such as 
gene contents, transcription regulatory sequence (TRS) or 
other unique genomic features, as in the subgroups in groups 2 
and 3 coronaviruses, as described below, support such a sub¬ 
classification. For the group lb coronaviruses, in addition to 
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the lack of common genomic features, there is no phylogenetic 
clustering (Fig. lb). Therefore, the group lb coronaviruses are 
in fact “non-group la” coronaviruses, rather than having 
common features that make them a distinct lineage. In the 
recent proposal of the Coronavirus Study Group of the ICTV 
(http://talk.ictvonline.org/cfs-filesystemfile.ashx/_key/ 
Community S erver. Component s. Post Attachments/ 
00.00.00.06.26/2008.085_2D00_122V.01.Coronaviridae. 
pdf), Geselavirus was proposed to be the name given to 
group la coronavirus. Although the genomes of all the 
members of this subgroup contain one (NS7a) or two 
(NS7a and 7b) ORFs downstream to N, hence the name 
Geselavirus, which stands for “gene seven last,” the 
genomes of some group lb coronaviruses, such as bat- 
CoV HKU8, also contain ORF downstream to N. 

Although the present sub-classification of group 1 
coronaviruses into groups la and lb may not be ideal, the 
best documented example of generation of coronavirus 
species through homologous recombination is present in 
group la coronavirus, which is the generation of FCoV [also 
called feline infectious peritonitis virus (F1PV) in some 
publications] type II strains by double recombination 
between FCoV (FIPV) type I strains and canine coronavirus 
(CCoV). It was originally observed that the sequence of S in 
type II FCoV was closely related to that of CCoV (43, 44) 
but the sequence downstream of E in type II FCoV was 
closely related to that of type I FCoV (45, 46). This suggests 
that there may have been a homologous RNA recombina¬ 
tion event between the 3’ ends of the genomes of CCoV and 
type I FCoV, giving rise to a type II FCoV genome. Further 
analysis by multiple alignments pinpointed the site of 
recombination to a region in the E gene. A few years later, 
Herrewegh et al. further discovered an additional recombi¬ 
nation region in the pol gene, and they concluded that type 
II FCoV in fact originated from two recombination events 
between genomes of CCoV and type I FCoV (47). 

Group 2 Coronaviruses ( Betacoronavirus ) 

Among the three groups of coronaviruses, the greatest 
improvement in our understanding in coronavirus phytog¬ 
eny lies in group 2 coronaviruses. Before the discovery of 
SARS-CoV, group 2 coronaviruses were considered to 
include one lineage, with all members possessing haemag- 
glutinin esterase genes and two papain-like proteases 
(PLl pro and PL2 pro ) in nsp3 of ORFlab (Fig. 2). When 
SARS-CoV was first identified and its genome sequenced, it 
was proposed that it constituted a fourth group of 
coronavirus (48, 49). However, after more extensive 
analyses of the amino-tenninal domain of S of SARS- 
CoV, it was observed that 19 out of the 20 cysteine residues 
were spatially conserved with those of the consensus 
sequence for group 2 coronaviruses (50). In contrast, only 
five of the cysteine residues were spatially conserved with 
those of the consensus sequences in group 1 and group 3 
coronaviruses (50). Furthermore, using both genomic and 


proteomic approaches, it was confirmed that SARS-CoV is 
probably an early split-off from the group 2 coronavirus 
lineage (51). Therefore, SARS-CoV was subsequently 
classified as group 2b coronaviruses and the historical 
group 2 coronaviruses were classified as group 2a 
coronaviruses. In 2005, we and others described the 
discovery of SARS-CoV-like viruses from at least four 
species of horseshoe bats in Hong Kong (Rhinolophus 
sinicus ) and mainland China (Rhinolophus ferrumequinum, 
Rhinolophus macrotis, and Rhinolophus pearsoni) (31, 34). 
These bat SARS-CoV were closely related to SARS-CoV 
found in humans and civets, with the pol, helicase, and N 
genes possessing more than 95% amino acid similarity with 
the corresponding ones in SARS-CoV from humans and 
civets. The greatest difference between the genomes of bat 
SARS-CoV and human and civet SARS-CoV lay in the 
sars3, sars8, and S genes, with amino acid identities as tow 
as 33% between the sars8 gene in bat SARS-CoV and those 
in human and civet SARS-CoV. These three genes were 
also the three genes that showed the greatest variations in 
the various genomes of human and civet SARS-CoV (31). 
Despite the finding of bat-SARS-CoV, its S protein only 
shared 79-80% amino acid identity to that of SARS-CoV, 
suggesting that SARS-CoV may have acquired a distinct S 
protein that has allowed interspecies transmission. In our 
previous studies, another novel group 1 coronavirus, bat- 
CoV-HKU2, was also found in Chinese horseshoe bats, the 
same bat species that harbors bat-SARS-CoV (29). Since 
co-infection of the same bat species by two different 
coronaviruses may have allowed the opportunities for 
recombination, the genome sequences of bat-CoV-HKU2 
were compared to SARS-CoV-like viruses to reveal 
possible recombination events. Bat-CoV HKU2 was found 
to possess a unique spike protein evolutionarily distinct 
from the rest of the genome. Its spike protein, sharing 
similar deletions with other group 2 coronaviruses in its C- 
terminus, also contained a 15-amino acid peptide homolo¬ 
gous to a corresponding peptide within the receptor binding 
motif of SARS-CoV spike protein, which was absent in 
other coronaviruses, except bat-SARS-CoV. Although no 
recombination events could be identified, the results suggest 
a common evolutionary origin in the spike proteins between 
bat-CoV HKU2 (a group 1 coronavirus) and bat-SARS-CoV 
and SARS-CoV (group 2 coronaviruses). It is also note¬ 
worthy that at least one member of group lb, HCoV-NL63, 
also uses ACE2 as the receptor for cell entry, as in the case 
of SARS-CoV, though the site of binding on ACE2 is 
different (52, 53). 

In 2006 and 2007, we proposed two additional 
subgroups of group 2 coronaviruses: group 2c and group 
2d (33). These two subgroups form two unique lineages, 
most closely related to, but distinct from group 2a and group 
2b coronaviruses. In addition to phylogenetic evidence, there 
is also clear-cut evidence from gene contents and other 
genomic features that four subgroups exist in group 2 
coronaviruses. For the gene contents of the genomes of 


Downloaded from ebm.sagepub.com at UZH Hauptbibliothek / Zentralbibliothek Zurich on July 6, 2014 


CORONAVIRUS DIVERSITY AND PHYLOGENY 


1121 


CD CD < 


CD C 



3.35* 





tttttt 


LLl CO 

o> 


> > 

LU O 

o a: 

h- Q- 

CTJ 


> > 


X 

* 

X 

> 

o 

Q 


co 

3 

* 

X 

> 

o 

o 


< CD 


> 
o 

CO 9 
CD CD 


g 

o 

> > 


3 

* 

X 


5 

% $ 
o: o 


< 

co 


m a) 
X X 


> 

X 
S X 


CO , , , 

o: > > > 
< o o o 
co o o o 


£ WJ W 


> 

O 

o 


X 

* 

X 

i y 

CO CD 


X 

X 

> 

o 

o 


o .9 


— :2 O .9 
&E X § . 

Jc o” 
°s?S“o 

E ~ CM 0 O 

g 8 - 0 . > 

~ I CM O 

E O X Q 

O 7^ ^ T 

X 


o 

CD LU CM ,9 5 

go^93 

<5 "S 


0 
o 

Q-P § 


X CO ^ 

*<§ 
^ CO 03 
(D Q 

.i -q i 

i>a 

20 05 ■ 

8 « 5 : 

CC ^ : 
£ < I : 
(0 ri; 
c: CO cd 

I TS1 
■ f. § 
5 00 5 


O ffl 


z z E 


b -> " 2 
i* id o 
b'5SP^S“ 

^ 9- c LO o 

= co O CD O il] 

" ' Sis 

—v v* 

cm <q x 
CO 2 I 

sis 

°ioS 

O o g 

2 O 


o 

._ o 

03 0 

S- o 

03 3 


6 § CD 1 

°§- 
C I CD 
03 o ~ ■ 


0 

I 

0 

> 

0 


_» vu v_i 

<= £ Z 


0 ! 

g 

o 


D-Z^S 
<D gj ° 

8 “ 
3 

nils 

<D . ffl > 

t 5 S2U 

;?§“ 

0" -9 Q) 0 

CL 0 >_ -Q CO 

g~8§ 

Q_ CD O o 

■ ■I ^ 9; | 

> CVJ O 


■3 


-- fr ° 
2<S 

> w § 
"> £ o 
: 0 | 
I E O 

.. ~\ —7 


CD 


CD 

^ 2 
X 3 

03 

i£ 

3 o 

CQ c 

■ - .9 

CD - CD 

s s> 

O l_ 
0-9 


in : 


0 0 0 
0 _*: £ — 

B 0 o o ^ 
o o O 
Q- CO CD £ 3 
0 . - ~ ^ 

-* 0 0 CD £ 
= 0 **- CO 9 

.g s >- s ! 

®2 oo o 
£- 0 o o o 

a®“o'is 

:ip£ a 

D-lS «< 

CD S’co ^ O 

m 2 rs <o o 

= e y a> y 

■L O x: CD 

”»«£ 
2 -a ® 

> o ^r 

0 'E o 

~ 0 O 

s I 

Q-O 
0 2 
0 w 
.9 co 
O Z) 

o * 


0 s_ X 

>pz"o 
E -*= o 
5>§Z< 

Q.O X 

S». 

Cl 3 
0 < .^ 
ra« g 
.9 c c 
0 g 2 
c E o 
- o o 


-+-■ _> 

_0 x: 

En^ 

® ° g a> 

o 0 ,:? 
£q> 
0^0. 
^^QCO 

Q. 


j§o 

>- in 

si? 

2 5 , 

8o 

0 

0 CO 


^ X 
0 * 
D3X 


0 


O 0 

0 CD 

§- o 

2 ^ 

cnx; 

C —- 


2 ® g- CL I 

S 2> 
s o a)Q 
° w c LU 
2 'o CL 


- o u- ffl 
S£ 8 opg 
2 >. “"co o 
■9 OS® O 

I ^oSi 

s g ^| 

!1”S* 

S?o^> 
• o ^ o 
0 O 
n • 


o 
•O 
j- <n B 

>!- 

i 

2 o 

— OCT) 
■to O O 
0) °| 
8 I o' 
§ 

o >3 

X c ^ 
g UJ m 

S^'5 

§s« 

E CD 9 
o o 2 
o o o 
m I ° 

go - 

■Q 2 Tt 

>|3 

o ® -F 

CQ = > 

o 


o ^ 

O CD 

S§ 

5 , s 
O co 

CO ^ 

2 i 


0^ 03 

E co 


> °i 

H 

CM 

m x; 

8 x 

0 


IU <—> 

Q-O 

Cf 2 ' 


o 


■= o 


CD 


o 


S’-D Z 3 2 
°< 


z S 


2 X 

o ^ 

ffl o' .2 
0 CL o 
_ 0 
oi 0 c 

Si 8 8 

3 ffi J= 
-°>P CO 


— C.VJ 

> -Q 
0 

C " 
O 'sf 

o g 
“ 8 
9 o 

2 CM 
E LL 
0 LU 


■C CM 


-*- 0 

in 0 

O 0 -r 
O0^ 

f J g S 

O E co 

S >-8 

o^o 1 

°P- 

0 K 0 

p in p 

.— CD ■— 

> o > 

0 O 0 


E CL 03 X X 


2 1 - 

X5 - 
0 ™ 

Is 

>S 

CD £ 


Downloaded from ebm.sagepub.com at UZH Hauptbibliothek / Zentralbibliothek Zurich on July 6, 2014 







































































































































1122 


WOO ET AL 


group 2a coronavirases, they possess PLl pro and PL2 pro in 
nsp3 of ORFlab, but group 2b, 2c, and 2d coronaviruses 
only possess one PL pro , which is homologous to PL2 pro . 
Furthermore, the genomes of group 2a, but not those of 
group 2b, 2c, and 2d coronaviruses, encode haemagglutinin 
esterase. For group 2b coronaviruses, their genomes, but not 
those of group 2a, 2c, and 2d coronaviruses, contain several 
small ORFs between the M and N genes. As for group 2d 
coronaviruses, their genomes, but not those of group 2a, 2b, 
and 2c coronaviruses, contain two ORFs downstream to the 
N gene. As for the TRS, the sequence for the TRS of group 
2a coronaviruses is CUAAAC and that of group 2b, 2c, and 
2d coronaviruses is ACGAAC (33, 54-56). For the E gene, 
TRS is present in group 2b, 2c, and 2d, but not group 2a, 
coronaviruses, in which an internal ribosomal entry site is 
used for their translation (33, 48, 49, 57). The genomes of 
group 2a, 2b, and 2c contain clear-cut overlapped bulged 
stem-loop and pseudoknot structures at the 3' untranslated 
region and immediately downstream to N. On the other 
hand, whether the genomes of group 2d coronaviruses 
possess similar bulged stem-loop and pseudoknot structures 
is controversial. Obviously, the genome of bat CoV-HKU9, 
the only member of group 2d coronaviruses identified so far, 
does not possess the classical bulged stem-loop and 
pseudoknot structures immediately downstream to N that 
were present in the genomes of group 2a, 2b, and 2c 
coronaviruses (33). Although it has been suggested that a 
candidate pseudoknot structure could be present at 1073 
bases downstream to N and a predicted candidate bulged 
stem-loop can be found upstream to it (58), the part occupied 
by the predicted candidate bulged stem-loop belongs to the 
putative coding region of NS7b, which is probably an ORF 
that is expressed because of the presence of TRS. 

Extensive homologous and heterologous recombination 
events have been documented in both human and animal 
group 2 coronaviruses, which has led to the generation of 
various genotypes and strains within a coronavirus species, 
as well as acquisition of new genes from other non- 
coronavirus RNA donors. Among the coronaviruses, MHV 
is one of the most extensively studied examples of 
homologous recombination in coronaviruses, and is also 
the coronavirus in which homologous recombination was 
first observed. Over 20 years ago, Lai et al. first observed 
homologous recombination as a result of mixed infection of 
DBT cells with MF1V strains A59 and JFIM (59). Genome 
analysis showed that the recombinant strain contained 
sequences from both parents and one crossover site. 
Subsequently, homologous recombination in MHV was 
further observed in tissue culture (59, 60) and experimen¬ 
tally infected animals (61). In the 1990s, it was found that as 
much as 25% of MHV were recombinants (62, 63). 
Furthermore, in vitro studies have shown variations in both 
sites and rates of recombination, with the S gene having a 
frequency threefold that of the pol gene (60, 63). As for 
human coronavirus, the most studied example was HCoV- 
HKU1. In our study on complete genome sequencing and 


phylogenetic studies of 22 strains of HCoV-HKUl, 
extensive recombination in different parts of the genomes 
was observed, which has led to the generation of three 
genotypes, A, B, and C, of HCoV-HKUl (64). The two 
most notable examples were observed in a stretch of 143 
nucleotides near the 3’ end of nsp6, where recombination 
between genotypes B and C led to generation of genotype 
A, and in another stretch of 29 nucleotides near the 3' end of 
nspl6, where recombination between genotypes A and B 
led to generation of genotype C. This represented the first 
example of recombination in human coronavirus and was 
also the first report to describe a distribution of the 
recombination spots in the entire genome of field isolates 
of a coronavirus. As for the acquisition of new genes from 
non-coronavirus RNA donors by heterologous recombina¬ 
tion, the most notable example is the HE gene from 
influenza C virus (65, 66). The presence of HE genes in 
group 2a, but not other group 2, coronaviruses suggested 
that the recombination had probably occurred in the 
ancestor of group 2a coronaviruses, after diverging from 
the ancestor of other group 2 coronaviruses. 

Group 3 Coronaviruses ( Gammacoronavirus ) 

Dramatic improvement in our understanding of the 
diversity and phylogeny, and potential interspecies jumping, 
of group 3 coronaviruses occurred in the last year. Since its 
discovery in 1937, IBV has been the only species of group 3 
coronavirus for over 50 years. In the last decade of the last 
century and the first few years of the 21st century, a few 
IBV-like viruses, including TCoV, have been described in 
various species of birds, with some of their genomes 
sequenced (36, 67-69). The sizes, G + C contents, and 
genome organizations of their genomes were similar, 
indicating that they probably have diverged from the same 
ancestor recently. This 70 years of quiescence was broken 
by two discoveries in 2008—first, the report on SW1 from a 
beluga whale, with the largest coronavirus genome; and 
second, the discovery of a novel subgroup of coronavirus 
from birds of different families, with the smallest corona¬ 
virus genomes (Table 1) (28, 37). 

SW1 was discovered from the liver tissue of a dead 
beluga whale (28). It was the first reported group 3 
mammalian coronavirus with complete genome sequence 
and was phylogenetically distantly related to IBV. 
Uniquely, eight ORFs, occupying a 4105-base region, were 
observed between M and N, giving rise to the largest 
reported coronavirus genome. We propose that this lineage 
should be group 3b coronavirus, whereas the IBV and IBV- 
like viruses should be group 3a coronaviruses. 

The novel subgroup of avian coronaviruses, group 3c 
coronavirus, that we recently described consisted of at least 
three members (BuCoV HKU11, ThCoV HKU12, and 
MuCoV HKU13), infecting at least three different families 
of birds (bulbuls, thrushes, and munias) (37). These 
coronaviruses were distantly related to IBV and SW1. Most 
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Figure 3. Examples of bats and birds in Hong Kong from which novel coronaviruses were discovered. Chinese horseshoe bat ( Rhinolophus 
sinicus) (panel A), from which bat-SARS-CoV and bat-CoV HKU2 were discovered; Lesser bamboo bat ( Tylonycteris pachypus) (panel B), from 
which bat-CoV HKU4 was discovered; Leschenault’s rousette (Rousettus lechenaulti ) (panel C), from which bat-CoV HKU9 was discovered; 
Chinese Bulbul ( Pycnonotus sinensis) (panel D) and Red-whiskered Bulbul (Pycnonotus jocosus) (panel E), from which BuCoV HKU11 was 
discovered; and Blackbird (Turdus merula) (panel F), from which ThCoV HKU12 was discovered. A color version of this figure is available in the 
online journal. 


interestingly, these three avian coronaviruses were also 
clustered with a coronavirus recently discovered in the 
Asian leopard cat (ALC-CoV), for which the complete 
genome sequence was not available (70). From the 
sequences of the gene fragments available, it was observed 
that ALC-CoV probably also employed the same putative 
TRS, NS6 is also present between M and N, and a stem-loop 
II motif (s2m), a conserved RNA element downstream to N 
and upstream to the polyA tail, is also present. This 
represents the hitherto closest relationship between mam¬ 
malian and avian coronaviruses, as the puffinosis virus, a 
group 2a coronavirus that had been found in birds, was 
considered as a contaminating MHV as a result of its 
passage in mouse brains (71). Complete genome sequencing 
of ALC-CoV and comparative genomics studies may reveal 
the secret behind interclass jumping in coronaviruses. 

Bat Coronaviruses as Gene Pool for Group 1 and 
Group 2 Coronaviruses and Avian Coronaviruses 
as Gene Pool for Group 3 Coronaviruses 

The discovery of bat-SARS-CoV has marked the 
beginning of the race of coronavirus hunting in bats (31, 
34). Among the 23 group 1 and group 2 coronaviruses with 
complete genome sequence available, 9 (39%) were from 
bats (Fig. 3). Furthermore, bats were also the hosts of 103 
(GenBank taxonomy data in Feb. 2009) additional corona¬ 
viruses, discovered in Asia, Europe, America, and Africa, 
although complete genome sequences were still not 
available (32, 35, 72-74). As for group 3 coronaviruses, 


they have been exclusively found in birds, with the 
exception of SW1 from the beluga whale and ALC-CoV 
from Asian Leopard cats (28, 70). As the race of 
coronavirus hunting in birds has just begun, we speculate 
that there are still many unrecognized coronaviruses in birds 
(Fig. 3). This diversity of coronaviruses in bats and birds 
could be related to the unique properties of these two groups 
of animals (75, 76). First, the diversity of bats and birds 
themselves is huge. Bats account for more than 20% of the 
4800 mammalian species recorded in the world. For 
example, although Hong Kong is an urbanized, subtropical 
city, it has extensive natural areas with more than 50 
different species of terrestrial mammals, with 40% of the 
species being bats. As for birds, this class contains around 
10,000 species, making them the most diverse tetrapod 
vertebrates; and in Hong Kong, there are more than 460 
different species of birds. This diversity of bats and birds 
would potentially provide a large number of different cell 
types for different coronaviruses. This is in line with the 
genus specificity for different bat and bird coronaviruses. 
For example, bat-SARS-CoV was found in Rhinolophus 
bats, bat-CoV HKU4 in Tylonycteris bats, bat-CoV HKU5 
in Pipistrellus bats, bat-CoV HKU9 in Rousettus bats, 
BuCoV HKU11 in Pycnonotus birds, ThCoV HKU12 in 
Turdus birds, and MuCoV HKU13 in Lonchura birds. 
Second, the ability to fly has given bats and birds the 
opportunity to go almost anywhere, free from obstacles 
faced by land-based mammals. Bats have been found at 
altitudes as high as 5000 m, and some birds can fly for over 
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Figure 4. A model of coronavirus evolution. Coronaviruses in bats are the hypothesized gene pool of group 1 and group 2 coronaviruses and 
coronaviruses in birds are the hypothesized gene pool of group 3 coronaviruses. 


10,000 km in their journeys of long-distance non-stop 
migration. This ability of bats and birds would have allowed 
possible exchange of viruses and/or their genetic materials 
with different kinds of living organisms. Third, the different 
environmental pressures such as food, climate, shelter, and 
predators would have provided different selective pressures 
on parasitisation of different coronaviruses in different 
species of bats and birds. Fourth, the habit of roosting in 
bats and flocking in birds results in a large number of bats 
and birds to gather together. This would have also facilitated 
exchange of viruses among individual bats and birds. 

The huge diversity of coronaviruses in bats and birds 
has made them excellent gene pools for groups 1 and 2 
coronaviruses and group 3 coronaviruses, respectively (Fig. 
4). It has been proposed that bat coronaviruses were the 
gene pools of all three groups of coronaviruses (77). 
However, it seems that there is no evidence supporting this 
hypothesis because more than 100 bat coronaviruses have 
been discovered and still none of them belonged to group 3. 
Instead, the present evidence supports that bat coronaviruses 
are the gene pools of groups 1 and 2 coronaviruses, whereas 
bird coronaviruses are the gene pools of group 3 
coronaviruses. We speculate that the ancestor of the present 
coronaviruses infected a bat and it jumped from the bat to a 
bird, or alternatively, it infected a bird and it jumped from 
the bird to a bat, evolving dichotomously. On the one hand, 
the bat coronavirus jumped to another species of bat, giving 
rise to the group 1 and group 2 coronaviruses, evolving 
dichotomously. These bat coronaviruses in turn jumped to 
other bat species and other mammals, including humans, 
with each interspecies jumping evolving dichotomously. On 
the other hand, the bird coronavirus jumped to other species 
of birds, and occasionally to some specific mammalian 
species, such as whale and Asian Leopard cat, with each 
interspecies jumping evolving dichotomously, giving rise to 
the group 3 coronaviruses. The properties of bats and birds 


mentioned above have facilitated the generation of a huge 
diversity of bat and bird coronaviruses as well as 
dissemination to other animals. 

Concluding Remarks 

In the past six years of the 21st century, we have 
witnessed a drastic increase in the number of coronaviruses 
discovered and coronavirus genomes being sequenced. With 
this increase in the number of coronavirus species and 
genomes, we are starting to appreciate the diversity of 
coronaviruses. Databases for efficient sequence retrieval and 
the ever-improving bioinformatics tools have further 
enabled us to start to understand the phylogeny of 
coronaviruses and perform additional genomic analyses 
(78, 79). With the increasing number of coronaviruses, more 
and more closely related coronaviruses from distantly 
related animals have been observed. Examples included 
FCoV and CCoV in group la; MHV and rat coronavirus or 
HCoV-OC43, BCoV, and PHEV in group 2a; bat, civet 
SARS-CoV, and human SARS-CoV in group 2b; IBV and 
TCoV in group 3a; and the Asian Leopard cat coronavirus 
and the novel avian coronaviruses in group 3c. These were 
results of recent interspecies jumping and may be the cause 
of disastrous outbreaks of zoonotic diseases. Detailed 
analysis of their genomes, particularly the S protein 
sequences and structures, as well as the receptors for the 
individual coronaviruses, will enable rational design of 
experiments to understand the secret behind interspecies 
jumping at the molecular level. 

We are grateful to Chung-Tong Shek for providing the photos of bats 
and Rex K. H. Au-Yeung for providing the photos of birds. 
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